Goto

Collaborating Authors

 memory utilization



Prepared for the Unknown: Adapting AIOps Capacity Forecasting Models to Data Changes

Poenaru-Olaru, Lorena, Hof, Wouter van 't, Stando, Adrian, Trawinski, Arkadiusz P., Kapel, Eileen, Rellermeyer, Jan S., Cruz, Luis, van Deursen, Arie

arXiv.org Artificial Intelligence

Abstract--Capacity management is critical for software organizations to allocate resources effectively and meet operational demands. An important step in capacity management is predicting future resource needs often relies on data-driven analytics and machine learning (ML) forecasting models, which require frequent retraining to stay relevant as data evolves. Continuously retraining the forecasting models can be expensive and difficult to scale, posing a challenge for engineering teams tasked with balancing accuracy and efficiency. Retraining only when the data changes appears to be a more computationally efficient alternative, but its impact on accuracy requires further investigation. In this work, we investigate the effects of retraining capacity forecasting models for time series based on detected changes in the data compared to periodic retraining. Our results show that drift-based retraining achieves comparable forecasting accuracy to periodic retraining in most cases, making it a cost-effective strategy. However, in cases where data is changing rapidly, periodic retraining is still preferred to maximize the forecasting accuracy. These findings offer actionable insights for software teams to enhance forecasting systems, reducing retraining overhead while maintaining robust performance. The term capacity management refers to ensuring that an IT service has sufficient infrastructure and resources to meet the current or future demand. Although capacity management is crucial to ensure efficient and effective service delivery, this process used to be carried on manually by continuously collecting and analyzing data [32]. Manual techniques to predict the capacity requirements become difficult to scale as the capacity management data sources increase, and it is significantly time-consuming for the engineers in charge. To automate the capacity management for machine utilization, like CPU and memory, companies have started employing forecasting AIOps models, which predict the resource demand in a timely fashion. This is particularly relevant for our industry partner, ING (International Netherlands Group) Bank, where operational engineers must monitor numerous time series to ensure sufficient resources are allocated for its large-scale online operations, supported by thousands of machines with varying resource demands.


A Real-Time Framework for Intermediate Map Construction and Kinematically Feasible Off-Road Planning Without OSM

Jerome, Otobong, Kulathunga, Geesara Prathap, Dmitry, Devitt, Murawjow, Eugene, Klimchik, Alexandr

arXiv.org Artificial Intelligence

Off-road environments present unique challenges for autonomous navigation due to their complex and unstructured nature. Traditional global path-planning methods, which typically aim to minimize path length and travel time, perform poorly on large-scale maps and fail to account for critical factors such as real-time performance, kinematic feasibility, and memory efficiency. This paper introduces a novel global path-planning method specifically designed for off-road environments, addressing these essential factors. The method begins by constructing an intermediate map within the pixel coordinate system, incorporating geographical features like off-road trails, waterways, restricted and passable areas, and trees. The planning problem is then divided into three sub-problems: graph-based path planning, kinematic feasibility checking, and path smoothing. This approach effectively meets real-time performance requirements while ensuring kinematic feasibility and efficient memory use. The method was tested in various off-road environments with large-scale maps up to several square kilometers in size, successfully identifying feasible paths in an average of 1.5 seconds and utilizing approximately 1.5GB of memory under extreme conditions. The proposed framework is versatile and applicable to a wide range of off-road autonomous navigation tasks, including search and rescue missions and agricultural operations.



Forecasting LLM Inference Performance via Hardware-Agnostic Analytical Modeling

Patwari, Rajeev, Sirasao, Ashish, Das, Devleena

arXiv.org Artificial Intelligence

Large language models (LLMs) have been increasingly deployed as local agents on personal devices with CPUs, NPUs and integrated GPUs. However, forecasting inference performance on devices with such heterogeneity remains challenging due to the dynamic compute and memory demands. Existing approaches rely on GPU benchmarking or machine learning-based latency predictors, which are often hardware-specific and lack generalizability. To this end, we introduce LIFE, a lightweight and modular analytical framework that is comprised of modular analytical model of operators, configurable to characterize LLM inference workloads in a hardware and dataset-agnostic manner. LIFE characterizes the influence of software and model optimizations, such as quantization, KV cache compression, LoRA adapters, chunked prefill, different attentions, and operator fusion, on performance metrics such as time-to-first-token (TTFT), time-per-output-token (TPOT) and tokens-per-second (TPS). LIFE enables performance forecasting using only hardware specifications, such as TOPS and memory bandwidth, without requiring extensive dataset benchmarking. We validate LIFE's forecasting with inference on AMD Ryzen CPUs, NPUs, iGPUs and NVIDIA V100 GPUs, with Llama2-7B variants, demonstrating the utility of LIFE in forecasting LLM performance through lens of system efficiency to enable efficient LLM deployment across different hardware platforms.


From Computation to Consumption: Exploring the Compute-Energy Link for Training and Testing Neural Networks for SED Systems

Douwes, Constance, Serizel, Romain

arXiv.org Artificial Intelligence

The massive use of machine learning models, particularly neural networks, has raised serious concerns about their environmental impact. Indeed, over the last few years we have seen an explosion in the computing costs associated with training and deploying these systems. It is, therefore, crucial to understand their energy requirements in order to better integrate them into the evaluation of models, which has so far focused mainly on performance. In this paper, we study several neural network architectures that are key components of sound event detection systems, using an audio tagging task as an example. We measure the energy consumption for training and testing small to large architectures and establish complex relationships between the energy consumption, the number of floating-point operations, the number of parameters, and the GPU/memory utilization.


Ensemble Method for System Failure Detection Using Large-Scale Telemetry Data

Mudgal, Priyanka, Wouhaybi, Rita H.

arXiv.org Artificial Intelligence

The growing reliance on computer systems, particularly personal computers (PCs), necessitates heightened reliability to uphold user satisfaction. This research paper presents an in-depth analysis of extensive system telemetry data, proposing an ensemble methodology for detecting system failures. Our approach entails scrutinizing various parameters of system metrics, encompassing CPU utilization, memory utilization, disk activity, CPU temperature, and pertinent system metadata such as system age, usage patterns, core count, and processor type. The proposed ensemble technique integrates a diverse set of algorithms, including Long Short-Term Memory (LSTM) networks, isolation forests, one-class support vector machines (OCSVM), and local outlier factors (LOF), to effectively discern system failures. Specifically, the LSTM network with other machine learning techniques is trained on Intel Computing Improvement Program (ICIP) telemetry software data to distinguish between normal and failed system patterns. Experimental evaluations demonstrate the remarkable efficacy of our models, achieving a notable detection rate in identifying system failures. Our research contributes to advancing the field of system reliability and offers practical insights for enhancing user experience in computing environments.


Achieving Pareto Optimality using Efficient Parameter Reduction for DNNs in Resource-Constrained Edge Environment

Mih, Atah Nuh, Rahimi, Alireza, Kawnine, Asfia, Palma, Francis, Wachowicz, Monica, Dubay, Rickey, Cao, Hung

arXiv.org Artificial Intelligence

This paper proposes an optimization of an existing Deep Neural Network (DNN) that improves its hardware utilization and facilitates on-device training for resource-constrained edge environments. We implement efficient parameter reduction strategies on Xception that shrink the model size without sacrificing accuracy, thus decreasing memory utilization during training. We evaluate our model in two experiments: Caltech-101 image classification and PCB defect detection and compare its performance against the original Xception and lightweight models, EfficientNetV2B1 and MobileNetV2. The results of the Caltech-101 image classification show that our model has a better test accuracy (76.21%) than Xception (75.89%), uses less memory on average (847.9MB) than Xception (874.6MB), and has faster training and inference times. The lightweight models overfit with EfficientNetV2B1 having a 30.52% test accuracy and MobileNetV2 having a 58.11% test accuracy. Both lightweight models have better memory usage than our model and Xception. On the PCB defect detection, our model has the best test accuracy (90.30%), compared to Xception (88.10%), EfficientNetV2B1 (55.25%), and MobileNetV2 (50.50%). MobileNetV2 has the least average memory usage (849.4MB), followed by our model (865.8MB), then EfficientNetV2B1 (874.8MB), and Xception has the highest (893.6MB). We further experiment with pre-trained weights and observe that memory usage decreases thereby showing the benefits of transfer learning. A Pareto analysis of the models' performance shows that our optimized model architecture satisfies accuracy and low memory utilization objectives.


RCMHA: Relative Convolutional Multi-Head Attention for Natural Language Modelling

Sugiharto, Herman, Aradea, null, Mubarok, Husni

arXiv.org Artificial Intelligence

The Attention module finds common usage in language modeling, presenting distinct challenges within the broader scope of Natural Language Processing. Multi-Head Attention (MHA) employs an absolute positional encoding, which imposes limitations on token length and entails substantial memory consumption during the processing of embedded inputs. The current remedy proposed by researchers involves the utilization of relative positional encoding, similar to the approach adopted in Transformer-XL or Relative Multi-Head Attention (RMHA), albeit the employed architecture consumes considerable memory resources. To address these challenges, this study endeavors to refine MHA, leveraging relative positional encoding in conjunction with the Depth-Wise Convolutional Layer architecture, which promises heightened accuracy coupled with minimized memory usage. The proposed RCMHA framework entails the modification of two integral components: firstly, the application of the Depth-Wise Convolutional Layer to the input embedding, encompassing Query, Key, and Value parameters; secondly, the incorporation of Relative Positional Encoding into the attention scoring phase, harmoniously integrated with Scaled Dot-Product Attention. Empirical experiments underscore the advantages of RCMHA, wherein it exhibits superior accuracy, boasting a score of 0.572 in comparison to alternative attention modules such as MHA, Multi-DConv-Head Attention (MDHA), and RMHA. Concerning memory utilization, RMHA emerges as the most frugal, demonstrating an average consumption of 2.98 GB, surpassing RMHA which necessitates 3.5 GB.


Knowing Is Half the Battle

#artificialintelligence

Miniaturization has revolutionized computing ever since the advent of the first digital computer. The machines that once filled entire rooms and required teams of engineers to operate are now vastly dwarfed in capability by the smartphones that we carry in our pockets. Now that trend towards miniaturization is transforming the field of machine learning as well. Image and voice recognition, object detection, predictive maintenance, and many other applications that once required complex algorithms that run on huge computing resources in the cloud can now run on a tiny microcontroller in a low-power IoT device. In part, these feats have been achieved with the help of advances in computing power that were previously mentioned, but that alone is not enough.